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3D SCANNING USING SHADOWS 



Cross Reference To Related Applications 
This application claims the benefit of the U.S. Provisional 
Applications No. 60/169,103, filed 12/6/99; 60/169,102, filed 
12/9/99; and 60/183;L72, filed on 02/17/00. 

Background 

Three-dimensional scanning has recently become popular. 
However, many of the scanners used for such scanning trade-off 
between cost of acquisition and/or use, accuracy, ease-of-use 
ly and/or speed of acquisitions. Many commercial 3-D scanners 
Ul emphasize the accuracy over all other parameters. 



illumination system in a fixed installation with controlled 
lighting. The object is transported using a motorized transport 
system. This can significantly increase the cost of the system. 
Moreover, this can make the system difficult to use in many 
installations, such as outdoors, where lighting may be difficult 
to control. 



Scanners such as the Cyberware scanner may use an active 
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Summary 

The present application teaches a system which allows 
scanning of 3D objects using shadow imaging. 
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In an embodinTSTit , the system requires only"^ computer and a 



camera. The computer may be programmed to image shadows, and 
determine three dimensional information of an object from the 
imaged shadows. 

Brief Description of the Drawings 
These and other aspects will now be described in detail 
with reference to the accompanying drawings, wherein: 

Figure 1 shows a general set up showing three-dimensional 
objects and the shadow cast on those objects; 

Figure 2 shows the geometric principle of the technique 
used to recover three-dimensional information; 

Figure 3 shows a flowchart of operation, showing the 
operations that the computer carries out to recover this three- 
U dimensional information; 

£3 Figure 4A and 4B show camera calibration; 

Figure 5 shows a three dimensional calibration; 
Figure 6A, 6B and 6C show using a pencil of known height to 
calibrate a position of the light source; 

figure 7 shows a summary of global geometry for a system 
with dual background planes and uncalibrated light source; 

figure 8 shows a summary of global geometry with a single 
background planes and uncalibrated light source; 

figure 9 shows scanning using a reference plane; 
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figure 10 shows depth propagation along multiple edges; 

figure 11 shows a flowchart of operation of multiple images 
being obtained; 

figure 12 shows the scanning set up with a light source and 
a straight edge at dual positions; 

figure 13 shows will coordinate transformation for the 
system; and 

figure 14 shows double-edged intersection using dual light 
sources . 

:5 Detailed Description 

[J The present application describes capturing three- 

dimensional surfaces using a simplified system. In one 
m embodiment, the lighting can be "weakly structured", i.e, any 

E 

light source can be used. 

The general principle is shown in Figure 1. The scene to 
be imaged 100 is shown with a shadow 105 on the scene. The 
shadow 105 is produced by an edge, e.g. a straight edge such as 
a pencil or a stick, and is caused to move across the scene 100. 
The human operator produces the movement. 

A camera 110 monitors the image of the scene including the 
shadows, and produces an output which is sent to a processor 
120. The processor estimates the three-dimensional shape of the 
scene from the sequence of images of the shadow as deformed by 
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three-dimensional^Kements of the scene. The processing 
operates to extract scene depth at least at a plurality of 
pixels in the image, more preferably at every pixel in the 



be of any size. 

More detail about the acquisition, including the geometry 
of the acquisition, is shown in Figure 2. A light source, shown 
approximated as point source 200, illuminates the scene 199. 
The intersection of the point source and the leading edge of the 
pencil define a plane A(t), at each time instant. Therefore, 
the boundary of the pencil's shadow on the scene comprises the 
intersection between the plane A(t), and the surface of the 
object . 

In Figure 2, the plane Dh represents the horizontal plane, 
and the plane Flv represents a vertical plane orthogonal to Dh. 
This vertical plane is optional. In the Figure 1 setup for 
example, the plane nv is removed. The position of the plane Dh 
in the camera reference plane will be known from calibration as 
explained further herein. The position of Flv is also inferred 
from a projection Ai at the intersection line between Dh and Flv. 

The end goal is to obtain three-dimensional coordinates of 
each point P in the three-dimensional scene. This may be done 
one pixel at a time. The three-dimensional location of the point 
P in space will be estimated corresponding to every pixel p at 



image. It should be understood, however, that the pixels could 
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e image obtained by the camera. Effectively, 

the user projects a succession of N shadows on the scene, 

thereby generating N shadow edges ^i, where 1=1. . . N. The 

points on this edge are estimated through triangulation, as 

described herein. 

The shadow time t is defined as the time when the shadow 
boundary passes by a given pixel along a line Xc- Ot represents 
the corresponding shadow plane at that time t. The leading edge 
of the shadow is represented by the value A. 

The projection of A on the image plane A is obtained by the 
camera at 305. This is done by a temporal processing operation 
of estimating the shadow time ts(Xc) at each pixel Xc. This 
temporal processing is followed by a spatial processing, in 
which the projection of A that is seen by the camera is 
represented by values A. Therefore, the two portions of the 
shadow projected on the two planes Dh and FIv are visible on the 
image as the lines Ah and Av. After extracting these two lines, 
the location in space of the two corresponding lines Ah and Av 
are obtained at 310. This can be done by forming a plane 
between the camera and the A lines. In Figure 2, the camera 
origin is shown as Oc- A plane between Oc and the A is can be 
obtained. These planes (Oc, A(t), and (Oc, A(t)) are intersected 
with the image planes Flh and EIv respectively. The shadow plane Fit 
is then found at 315. If the vertical plane has been used for 
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scanning, then the shadow plane is the plane defined by the two 

non collinear lines A, and A. at 315. ,lf the vertical plane has 
not been used for scanning, however, the shadow plane may still 
be inferred from the only available line Ah and the position of 
the point light source S. in this latter case, point light 
source needs to be at a known and fixed location in space. 

At 320, the actual point P corresponding to the pixel Xe is 
retrieved by triangulating n of T with the optical ray Oc from 
the camera. 

If the light source S is at a known location in space, then 
the shadow plane n(t) may be directly inferred from the point S 
and the line A,. Consequently, no calculation of the additional 
plane n(t) is not required. Therefore, two different 
embodiments are contemplated: a first having two calibrated 
planes (O, and n.) and an uncalibrated light source, and the 
second having one calibrated plane and a calibrated light 



source 



Camera Calibration 

Camera calibration can be used to recover the location of 
the ground plane n, and the intrinsic camera parameters. The 
intrinsic camera parameters may include focal length, principal 
point, and radial distortion factor. A first embodiment of the 
calibration technique herein uses a planar checkerboard pattern. 
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Figures 4A and 4B^now the setup which includes a planar 
checkerboard pattern 400 placed on the imaging area, e.g., the 
ground or table surface, in a general location of the objects to 
be scanned. The camera 402 is pointed at this general area. 
The camera 402 captures an image shown in Figure 4B. This image 
is used to infer the intrinsic and extrinsic parameters of the 
camera . 

The projections on the image planes of the known grid 
corners are matched to the expected projection that is directly 
on the image. This is described in more detail in Tsai "A 
Versatile Camera Calibration Technique for High Accuracy 3-D 
Machine Vision Metrology using off-the-shelf TV cameras and 
Lenses". A first order symmetric radial distortion model is 
used for the lens. When this technique is used with a single 



image, the principal point, that is the intersection of the 
optical axis with the image plane, cannot be easily recovered. 
This principal point is therefore assumed to be identical with 



the image center. A full camera model may be used which 
integrates multiple images of the planar grid at different 
locations in space with different orientations, e.g., using 
three or four images. 

It has been found that the quality of reconstruction is 
relatively insensitive to errors in the principal point 
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, either of the calibrationXechniques can 

be used. 

As described above, when a single reference plane, e.g. Dh, 
is used for scanning, the location of the light source must be 
known in order to infer the shadow plane location lit. Another 
embodiment allows calibration in the light source using an item, 
e.g. a pencil or a stake of known length. The operation and 
geometric theory is shown in Figures 6A-6C. In Figure 6A, the 
pencil is shown standing on the reference plane Ilh. The camera 
605 observes the shadow of the pencil 610 as projected on the 
ground plane. The acquired image is shown in Figure 6B. Two 
points: b and ts are found in this acquired image, as shown in 
Figure 6B. 

Figure 6C shows a geometrical diagram. The points b and ts 
can be triangulated to obtain the position in space of pencil 
points B and T using the known the height of the pencil H. 
Effectively, the coordinates of the pencil tip T can be directly 
inferred from B. As shown, this is done by intersecting optical 
rays (0c,b) and (Oc,ts) with the Ilh that is known from camera 
calibration. The point light source S therefore must lie in the 
plane A = (T, Ts) in space. This by itself yields a first linear 
constraint on the position of the light source. 

By taking a second view, with the pencil at different 
locations on the plane, a second independent constraint with 
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another line A prime can be obtained. A closed form solution 

for the 3-D coordinates of S may then be derived by intersecting 

the two lines A and A prime using a least squared combination. 

Two views may provide sufficient information. However 
since the problem is linear, the estimation can be made more 
accurate by obtaining more than two views. 

Operation 305 in Figure 3 requires determining lines of 
intersection between the shadow plane 11 of T. and the two planes 
n of h and n of v. This may be done by the spatial processing 
of localizing edges of the shadow that are directly projected 
onto orthogonal planes at each discrete time t to form the set 
of all shadow planes n of T. Temporal processing also, and also 
estimating the time ts as the shadow time, where the edge of the 
shadow passes through any given pixel Xc=Xc/yc in the image. 

These processing tasks allow finding the edge of the 
shadow. However, the search domains for the spatial and 
temporal processing tasks may be different. The spatial task 
operates on the spatial coordinates or image coordinate system, 
while the temporal task operates on the temporal coordinates 
system. One aspect of the present system enables making the two 
search procedures more compatible so that at any time to when 
the shadow passes through a pixel Xc, the two searches still find 
the exact same point (x, y, tO) in space-time. 
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A technique iTdescribed herein called spatio-temporal 

thresholding. This thresholding is based on the observation 

that, as the shadow is scanned across the scene, each pixel x,y 

sees its brightness intensity going from an initial maximum 

value Imax down to a minimum value In,in. The pixel then returns 

to its initial value as the shadow goes away. This profile is 

characteristic even when there is a non-neglible amount of 

internal reflections in the scene. 

For any given pixel Xc = (x,y), define In,in(x,y) and 
Imax(x,y) as its minimum and maximum brightness throughout the 
entire sequence: 

Lm(x,y)= ^ {I(x,y,t)} 

The shadow is defined as to be the locations (in space- 
time) where the image I(x,y,t) intersects with the threshold 
image Isnadow (x, y) ; defined as the mean value between Imax(x,y) and 
Imin (x, y) : 



This may be also regarded as the zero crossings of the 
difference image AI(x,y,t) defined as follows: 
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AI(x,y,t) = I(x,y,0-I^ 



shadow 



The concept ^of dual space may assist in certain 
calculations . 

SHADOW PLANE ESTIMATION 11 (t) 

Denote by co {t) , Av(t) and A^{t) the coordinate vectors of 
the shadow plane n(t) and of the shadow edges /lh(t) and Av(t) at 
time t. Since Ah(t) is the projection of the line of 
intersection Ah(t) between n(t) and Rh, then (t) lies on the line 

passing through sTh with direction /lh(t) in dual-space (from 
proposition 1 of section 2.2.2). That line, denoted Ah(t), is 
the dual image of Ah(t) in dual-space as described above (see 

section 2.2.2). Similarly, O) (t) lies on the line Av(t) passing 

through with direction Av(t) (dual image of Av(t)). 
Therefore, in dual-space, the coordinate vector of the shadow 

plane O) {t} is at the intersection between the two known lines 
Ah(t) and Av(t). In the presence of noise, these two lines might 
not exactly intersect (equivalently, the 3 lines Ai, Ah(t) and 
Av(t) do not necessarily intersect at one point on the image 

plane, or their coordinate vectors /li, /lh(t) and /lv(t) are not 

coplanar) . However, one may still identify o) (t) with the point 
that is closest to the lines in the least-squares sense. The 
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complete derivations for intersecting a set of lines in space 

may be found in section 4.1.2. When interesting the two lines 

Ah(t) and Av(t) in space, the solution reduces to: 



with 



where 



^(0=^(6>,(0+^2(0X 



where A is a 2 x 2 matrix and b is a 2-vector defined as follows 
(for clarity, the variable t is omitted) : 



<XhyXh> ~'<AhyAv> 



Note that the two vectors coiit) and 0)2(2) are the 
orthogonal projections, in dual-space, of o) (t) onto Ah(t) and 
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Av(t) respectively^ The norm of the difference between these two 
vectors may be used as an estimate of the error in recovering 
n(t). If the two edges Ah(t) and Av(t) are estimated with 
different reliabilities, a weighted least square method may 
still be used. 

Using the additional vertical plane riv enables extracting 
the shadow plane location without requiring the knowledge of the 
light source position. Consequently, the light source is 
allowed to move during the scan. This may allow use of natural 
light, e.g. the sun, for example. 

When the light source is of fixed and known location in 
space, the plane riv is not required. Then, one may directly 
infer the shadow plane, position from the line Ah(t) and from the 
light source position S: 



where Xs=[Xs Yg Ys]"^ is the coordinate vector of the light source 
S in the camera reference frame. In dual-space geometry, this 



where 



5€n(/)o<cr(/),X.> = loa,=^^4^ 

<Xh(t),Xs > 



(6.10) 



JB^ Attorney Dock^^o. 06618/565001 

corresponds to th^^ntersecting the line Ah(t) with plane S, dual 

image of the source point S. Notice that <>lh(t)/Jfs> = 0 
corresponds to the case where the shadow plane contains the 
camera center projection Oc- This is singular configuration that 

may make the triangulation fail ( || (t ) || -> oo ) . This approach 
requires an additional step of estimating the position of S 
using light source calibration. This reconstruction method was 
used in experiments 2 and 3. 

The quantity 1 - <0)^^, X s> reduces to hg/dh where hs and dh 
are the orthogonal distances of the light source S and the 
camera center Oc to the plane rih- 

Since co^i is the coordinate vector of plane Oh, the vector 

Wh = dh6>h is the normal vector of the plane rih in the camera 
reference frame. Let P be a point in Euclidean space (E) of 

coordinate vector X . The quantity dh - <n\^,X> is then the 
(algebraic) orthogonal distance of P to rih (positive quantity if 
the point P is on the side of the camera, negative otherwise) . 

In particular, if P lies on Hh, then <n^,X> = dh, which is 

equivalent to <0)h,X> = 1. The orthogonal distance of the light 

source S to Hh is denoted hg . Therefore hs = dh - <n^,X>f or 

equivalently 1 - <C0Yir X s> = hs/dh. 

According to that claim, the constant ah of equation 6.10 
may be written as: 
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Kid, \ld, 



< Xh (/), X. > <Xh (0, Xslh^> 



This expression highlights the fact that the algebra 
naturally generalizes to cases the light source is located at 
infinity (and calibrated) . Indeed, in those cases, the ratio 

Jfs/hs reduces to ds/sxn ^where ds is the normalized light 
source direction vector (in the camera reference frame) and ^the 
elevation angle of the light source with respect to the Ilh . In 

dual-space, the construction of the shadow plane vector co (t) 

y remains the same: it is still at the intersection of Ah(t) with 

fy 

in S. The only difference is that the dual image S is a plane 

crossing the origin in dual-space. The surface normal of that 

plane is simply the vector d^- 

C3 

fn 
* 

□ TRIANGULATION 

Once the shadow time ts(JCc) is estimated at a given pixel Xc 
= [Xc yc 1]^ (in homogeneous coordinates), the corresponding 

shadow plane n(ts(Xc)) is identified (its coordinate vector 

^c= <i> (ts ( Xc) ) ) by triangulation at 320. The point P in space 

associated to Xc is then retrieved by intersecting n(ts(JCc)) 

with the optical ray (Oc, Xc) : 
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<(Oc,Xc > ' ' <a)c,Xc>' 



If Xc - [Xc Yc Zc]'^ is defined as the coordinate vector of P in 
the camera reference frame. 

Notice that the shadow time t^ix^) acts as an index to the 
shadow plane list n (t) . Since Uix^) is estimate at sub-frame 
accuracy, the plane n(ts(Xc)) (actually it coordinate vector a^) 
results from linear interpolation between the two planes U (to - 
1) and n (to) if to-l<ts(^c)< to and to integer: 

where At = to-ts ( JCc) , 0 < A t<l . 

Pixels corresponding to occluded regions in the same cannot 
provide substantive information. Therefore, only pixels that 
have a contrast value larger than a predetermined threshold are 
used. In the experiments giving herein, where intensity values 
are encoded between zero and 255, this threshold may be 30. The 
threshold may also be proportional to the level of noise in the 
image. 
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Once the shad^ time is estimated at any given pixel, the 



shadow plane can be triangulated from its coordinate vector. 
The point in space (P) associated to the given pixel is then 
retrieved by intersecting the shadow plane with this optical 
ray. This is shown in Figure 2. More specifically. 



{cOcXc^ 



the shadow time therefore ask as an index to the shadow plane 
list. This effectively provides range data to the actual 
object . 

The recovered range data is used to form a mesh by 
connecting neighborhood points in triangles. Connectivity is 
directly given by the image. Two vertices become neighbors if 
their corresponding pixels are neighbors in the image. 
Moreover, since each vertex corresponds to a unique pixel, the 
texture mapping can be relatively easily carried out. 

Figure 7 shows the global geometrical principal of the 
reconstruction technique in normal Euclidean space (left) and 
"dual space" (right) . This Figure shows the correspondence 
between Euclidean space and dual space for objects, e.g. lines 
and points on the image plane, as well as objects in 3-D which 
include planes, lines and points in space. 
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e calibration of the vertical plane Ilv is 

also illustrated in the dual-space diagram: its coordinate 
vector is at the intersection of the Ai and the set of plane 
vectors orthogonal to 6>h (defining a plane in dual-space) . The 
line Ai is at the intersection of the two planes Ilh and Ov, and 
it dual image Ai is uniquely defined by the horizontal plane 

vector o)Yi and the vector Ai, coordinate vector of the line Ai 
observed on the image plane. This calibration process is 
described above* 

Once 0)^ is known, the shadow plane vector o? (t) associated 
to the shadow edge configuration at time t is at the 
intersection between the two lines Ah(t) and Av(t), dual images 
of Ah(t) and Av(t) . Those two dual lines are defined by the two 

reference plane vectors coy^ and o)^ and the direction vectors 

/lh(t) and /lv(t) (vector coordinate of the two image lines Ah(t) 
and Av(t)). This processing step is described in detail. 

The final step consisting of identifying the point P in 
space by intersecting the optical ray (Oc^p) with the shadow 
plane Yl is also illustrated on the dual-space diagram. In dual- 
space, that stage corresponds to finding the dual image of P 

that is the unique plane in dual-space containing the point co (t) 
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(shadow plane vector) with orthogonal vector Xc (homogeneous 

coordinate vector of the image point p) . 

An alternative embodiment uses a single reference plane (llh 
without riv) with a calibrated light source is summarized on 
Figure 8. This technique uses a different procedure of 

estimating the shadow plane coordinate o> (t) . In that version, 

the vector O) (t) is at the intersection of the dual line Ah(t) 
and the dual image of the light source S. The triangulation 
operation remains unchanged. 

The point P in space is determined by intersecting the optical 
ray Oc, P with the shadow plane 11. This corresponds to finding 
the dual image of P that is the unique plane in dual space that 
contains the point including the shadow plane vector with the 
orthogonal vector. This is done by triangulation as described 
above . 

The alternate setup uses a single reference plane with a 
calibrated light source as shown in Figure 8. In the Figure 8 
setup, a different procedure is used to estimate the shadow 
plane coordinate vector. 

The accuracy of this system is not necessarily increased by 
decreasing the scanning speed. However, the scanning speed must 
be sufficiently slow to allow each temporary pixel profile to be 
sufficiently sampled. A sharper shadow edge will require slower 
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temporal profile at each pixel can be 

properly sampled. The reality, however, is that the speed of 

moving the shadow is really limited by the response speed of the 

image acquisition element. 

In this system, range data can only be retrieved for pixels 
that correspond to regions in the scene that are illuminated by 
the light source and imaged by the camera. Better coverage of 
the scene may be obtained from multiple scans of the same scene 
keeping the light source at different locations each time and 
keeping the camera position fixed. 

The previous system has described accurately characterizing 
3-D surface based on projection of shadows on a known reference 
plane. Another embodiment describes operating without using a 
background plane as the reference plane, and instead using a 
calibrated light source. 

A summary of the scanning scenario for a flat reference 
plane is shown in Figure 9. In the first embodiment, the plane 
rid is used as a reference plane for scanning. The position of 
the light source S is known and the position of the plane lid is 
also known from calibration/setup. 

A curved edge shadow ^ is projected on the image during 
scanning. All points in space P are estimated as the curved 
edge moves across the image, by estimating from the points p on 
^. n denotes the corresponding shadow plane. The image 930 
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corresponds to that image which is seen by the camera. The 
corresponding points A and B in the scene may be found by 
intersecting rid with the optical rays (Oc,a) and (Oc,b) from the 
image. The shadow plane n may then be inferred from the three 
points in space: S, A and B. 

For a given position of the shadow, once the shadow plane n 
is identified, the 3-D image of the entire shadow edge ^ can be 
determined by geometric triangulation using, among other things, 
the known position of the reference plane 11^. This embodiment 
describes finding the 3D image, using a similar overall 
technique to that described above, but without knowing D. 



number of issues about the image. First, depth information at 
two distinct points on an edge propagates along the entire edge. 
Also, if depths Z of two distinct points a and b of this shadow 
edge ^ are known, then the depth at any other point in the image 
Zp can also be found. Depth at two distinct points propagates 
over the whole image. 



Let Xa and Xb be the homogeneous coordinate vectors of the 

two points a and b on the image plane Xp, = [xa Va l]*^. Then if 
the two depths Za and Zb are know, then so are the full 
coordinate vectors of the ' associated points A and B in the 3D 

scene: X = ZaXa and Xb = ZbXb- Therefore the associated shadow 



: . 3 



The extension to this embodiment takes into account a 
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plane n is the unique plane passing through the three points A, 

B and S. Once n is recovered, any point p along the edge £ may 
be triangulated leading to Zp. 

The depths of two points a and b may be known if they lie 
on the known reference plane 11^. Consequently, following 
Property 1, depth information at a and b propagate to every 
point p along the edge G. 

Depths can also propagate from edge to edge. This concept 
is used here to allow the depths to propagate over the entire 
image. According to the present embodiment, knowledge of the 
depth of three distinct points in the image can be used to 
recover the entire scene depth map. Hence, knowledge of these 
three distinct points can be used in place of knowledge of the 
reference plane rid- 

A, b and c are defined as three distinct points in the 
scannable area of the image. For purposes of this explanation, 
it can be assumed their depths Za, Zb and Zc are known. By 
appropriate shadow scanning, the depth of any point p in the 
image can be obtained. 

This. can be demonstrated using a constructive example. A 
shadow edge ^1 that goes through points a and B can be projected. 
A second shadow edge ^2 can be projected through points a and c. 
This is shown in Figure 10. The two points a and b on Gi are of 
known depth, and therefore the depth of every point along that 
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edge can be similaWy computed. Analogously, for £2, the depth 

of every point along that edge can be computed. 

For every point p on the image, there can be an edge ^p that 
passes through p and intersects both €1 and £2 at distinct points 
Pi and P2 which points are each different than a. Since Pi and P2 
each lie on the two known edges at Gi and £2, the depths of ^1 
and ^2 must also be known. Therefore, since two points on the 
shadow edge Gp. are known, the depth of every point along £,p may 
also be computed. In particular, the depth of the point p can 
be computed as shown in Figure 10. Moreover, if points between 
the edges intersect, then the depth information can propagate 
from edge to edge. 

As stated above, this means that knowledge of the depth at 
three distinct points in the image becomes sufficient to recover 
the entire scene depth map. To propagate depth information from 
{a,b,c} to p uses intersecting points pi and p2 of the shadow 
edges . 

The system uses edges ^. An edge ^ is an isolated edge if 
and only if it does not intersect with at least two other edges 
on the image. Depth information can not be propagated to any 
isolated edge from the rest of the edges. Isolated edges, 
therefore, can not be used in this system. 

This embodiment follows the summary flowchart of Figure 11, 
using the setup shown in more detail in Figure 12. In Figure 




Attorney Dock* 




o. 06618/555001 



11, a number of summary operations are described, which are each 
described in further detail herein. The hardware of Figure 12 
shows a point source at S, and a straight edge producing 
element, e.g. a stick 1200 being shown in two different 
positions. Each different position projects a shadow onto the 
3-D scene 12. A camera at 1220 receives a view of the image 
plane, and uses that view to reconstruct details of the 3-D 
image . 

At 1110, a set of shadow images is obtained. The shadows 
are extracted, and their intersections are computed. 

Let rii be the i^^ shadow plane generated by the stick (i = 

— i i i rr, 

1,..., N), with corresponding plane vector co i = w w ] (in 



dual space) . For all vectors Q)t to be well defined, it is 
required that none of the planes Di contain the camera center Oc- 
Denote by Gi the associated shadow edge observed on the image 

plane. The N vectors coj constitute then the main unknowns in 
the reconstruction problem. Once those vectors are identified, 
all edges can be triangulated in space. Therefore, there is 
apparently a total of 3N unknown variables. However, given the 
scanning scenario, every shadow plane Di must contain the light 

source point S. Therefore, Xs = [Xs Ys Zs]^ the light source 
coordinate vector in the camera reference frame (known), 
provides 



X 



y 



z 
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V/ 



= \,...,N, {(0^,Xs) = \ 



Equivalently, in dual-space, all shadow plane vectors cOi must 
lie on the plane S, dual-image on the light source point S. One 
may then explicitly use that constraint, and parameterize the 

— - / /' 

vectors cOi using a two-coordinate vector u i = [u u ] such 

X y 

that: 



a)i = Wui + fi>o = ^o)s\ cosijui +o)q 

I z * 

rU where coq, cOsi, and C0s2 three vectors defining the 

?3 parameterization. For example, if Xs ^ 0, one may then keep the 

Li I 

L — - / /' 

H last two coordinates of o) i as parameterization: u i = [w w ] , 

ru y ^ 

U 

^L: picking 0)si = [-Yg/Xs 1 0]^, O) s2 = [-Zs/Xs 0 1]^ and o)o = [1/Xs 0 

u 

0]'^. Any other choice of linear parameterization is acceptable 
(there will always exist one give that is S ^Oc) - In order to 

define a valid coordinate change, the three non-zero vectors Oo, 

cOsir and 6^32 niust only satisfy the three conditions (a) <(0o, X s> 

= 1, (b) <C0sifXs> = 0, (c) o)si ^fi>s2- In dual-space, {cOsi, 0)52) 
(or W) may be interpreted as a basis vector of the plane S and 
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6)0 as one particular point on that plane. Figure 13 shows the 
coordinate transformation . 

After that parameter reduction, the total number of unknown 

variables reduces to 2N: two coordinates u and u per shadow 



plane Hi. Given that reduced plane vector parameterization 

(called w -parameterization) , global reconstruction can be 
carried out. 

Intersecting points between the edges themselves depth 
information that propagates from edge to edge. These points 
provide geometrical constraints that may be extracted from the 
images. Therefore, a first operation may be carried out by 
studying the type of constraint provided by an elementary edge 
intersection. 

Assume that the two edges En and Sm intersect at the point 
Pk on the image (n m) , and let On and be the two associated 
shadow planes with coordinate vectors 0)^ and fi^rn as shown in 

Figure 12. Let Xk be the homogeneous coordinate vector of pk on 
the image plane, and Zk the depth of the corresponding point Pk 
in the scene. The two edges en and intersect in space at Pk 
if and only if the planes Iln and Ilm and the scene surface 
intersect at a unique point in space (Pk) . Equivalently, the 
depth Zk may be computed by triangulation using either plane Fin 
or riin. This condition translates into the two constraint 



X 



y 
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equations Zk = l/<(On, = l/<cym/^k> in dual-space which can be 

rewritten as follows: 



This unique equation captures then all the information that 
is contained into an elementary edge intersection. There is a 
very intuitive geometrical interpretation of that equation: Let 
Ak be the line of intersection between the two planes Fin and Yl^a, 



onto the image plane. Then, the vector A = cOn-cOm is one 
coordinate vector of the line Ak. Therefore, equation 7.3 is 

merely <Xkf/lk> = 0, which is equivalent to enforcing the point pk 
to lie on Xk (see Figure 12) . This equation has both advantages 
of (a) not explicitly involving Zk (which may be computed 
afterwards from the shadow plane vectors) and (b) being linear 

in the plane vectors unknowns cOn snd ^m- The same constraint 

may also be written as a function of Un and the two w- 

parameterization vectors of the shadow planes rin and Om: 




in space, and let Ak be the perspective projection, of that line 




= 0 



where j^k = W^Xk (a 2-vector) . Notice that this new equation 
remains linear and homogeneous in that reduced parameter space. 
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Let Np be the total number of intersection points Pk (k = 
l,...,Np) existing in the edge-web (the entire set of edges). 
Assume that a generic pk lies at the intersection of the two 
edges and Em(k){n(k) and m(k) are the two different edge 

indices) . 

The total set of constraints associated to the Np 
intersections may then be collected in the form of Np linear 
equations: 



'^k = \,...,N^, [y„un(k)-u„(k)) = (i 

which may also be written in a matrix form: 

AC7 = 0^^ 



where O^p is a vector of Np zeros, A is an Np x 2N matrix 
(function of the coordinate vectors only) and U is the 

vector of reduced plane coordinate (of length 2N) : U = [u^ 



1 



iT 1 1 2 2 , _ _ 

«2-^ - u^ ^^-^ • The vector U = {u,},.,,^. 

According to this equation, the solution for the shadow plane 
vectors lies in the null space of the matrix A. The rank of 
that matrix or equivalently the dimension of its null space are 
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identified at 1120, isolated edges and isolated group of edges 

are rejected. This forms an edge web which is fully connected, 

resulting in a set of an edges and Np intersection points Pk. 

An edge-web is fully connected if and only if it cannot be 
partitioned into two groups of edges which have less that two 
(zero or one) points in common. In particular a fully connected 
edge-web does not contain any isolated edge. Notice that under 
the condition only, depth information can freely propagate 
though the entire web. A normal scanning scenario is then 
defined as a scenario where the edge-web is fully connected and 
the total number of intersections is larger than 2N (this last 
condition will be relaxed later on) . 

In a normal scanning scenario, the rank of the matrix A is 
exactly 2N-3 (or alternatively, the null space of A is of 
dimension 3) . 

This is because in a normal scanning scenario, the 
reconstruction problem has at most three parameters. 
Consequently the dimension of the null space of A is at most 3, 
or equivalently, A is of rank at least 2N-3 in fact, usually 
exactly 2N-3. At 1130, a unitary seed vector Uo is calculated. 
The scene factor is calculated using singular value 
decomposition or SVD. 

For every solution vector U = {ui) to the equation, there 
exists three scalars a, p and A such that: 
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/ 

which are also obtained at 1130 with Wo = [oc P]*^. Conversely, 

for any scalars a, (5 and the vector U - {u±} given by the 

equation is solution of a linear system. The vector is 
called a ''seed" solution from which all solutions of the linear 
system may be identified. 

-n -0 

Any non-trivial solution vector U = {u } may be used as 

seed as long as it is one solution of the equation the solution 
vector (called the unitary seed vector) that satisfies the two 

extra normalizing conditions: (a) 2^._^w =0 (zero mean), (b) 




= 1 (unit norm) may be used. Those conditions assure a 



Q non trivial solution meaning that all Uj cannot be identical. 
□ _ _ 

The unitary seed vector satisfies the linear equation BU^ = 

6n2-o/ where b is the following augmented (Np -2) x 2N matrix: 



B = 



A 

10 10 
0 10 1 



1 0 
0 1 
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The last two rows of B enforce the zero-mean constraint, 

bringing the rank of B to 2N-1 (=(2N-3) + 2). Therefore, the 
dimension of null space of B is one, leading to U° being the 
unitary eigenvector associated to the unique zero eigenvalue of 
B. Consequently, a standard singular value decomposition (SVD) 
of B allows to naturally retrieve U°. Such a decomposition 
leads to the following relation: 



B = USV'' 

where U and V = [Vi V2 ... Vsn] are two unitary matrices of 
respective sizes (Np+2 ) x (Np+2 ) and 2N x 2N, and S is the (Np+2)x 
2N matrix of singular values. The unitary seed vector U° is 
then the column vector of V associated to the zero singular 
value. Without loss of generality, assume it is the last column 



fl „ +. 77 0 



0 _ -0 

vector: U - {u ^} = V2N. Alternatively, one may retrieve the 

same V matrix by applying the same decomposition on the smaller 
2N x 2N symmetric matrix C = B^B. Such a matrix substitution is 
advantageous because the so-defined matrix, C, has a simple 
block structure: 
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c = 



^2,1 ^2,2 ^2,Af 



where each matrix element Cij is of size 2x2. 

Two shadow edges can only intersect once (two shadow planes 
intersect along a line that can only interest the scene at a 
single point). All matrices Cij have the following expressions: 



C,=/2-Y.(,)yJ ifi^j 
k(ij) 

^ — —T 



where I2 is the 2x2 identity matrix. Observe that, every off- 
diagonal matrix element Cij(i?^j) depends only on the intersection 
point Pk(ij) between edges Gj and Gj . Every diagonal block Ci^i 
however is function of all the intersection points of Ei with the 
rest of the edge-web (the sum is over all points Pk(i,n)/ for n = 
(1,. . wN). 

Once the C matrix is built, V is retrieved by singular 
value decomposition (1130; Figure 11). This technique allows 
then for a direct identification of the unitary seed solution of 

= {u^ } leading the set of all possible solutions of the 
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equation. Euclid' 




three parameters a, p and X. 

Once the seed solution U = {u } is found (by SVD) , one may 

identify the final ''Euclidean" solution U = {Wj} at 1140 if the 
depth of (at least) three points in the scene are known. 
Without loss of generality, assume that these points are Pk for k 
= 1,2,3 (with depths Zk) . Those points provide then three linear 
equations in the unknown coefficient vector Qr=[a, (} /]'^ 



for k = 1,2,3 resulting into a linear system of three equations 
and three unknowns. This system may then be solved, yielding 
the three coefficients a, p and A, and therefore the final 

solution vector U. Complete Euclidean shape reconstruction is 
thus achieved. If more points are used as initial depths, the 
system may be solved in the least squares sense (once again 
optimal in the inverse depth error sense) . Notice that the 
reference depth points do not have to be intersection points as 
the equation contends. Any three (or more) points in the edge- 
web may be used. 

, While the above has described strict reconstruction, other 
clues about the scene may also be used. These clues may include 




^J^k Attorney DockJ^^c 

:t^ros on the scene, anqles between 
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planarity of portWRs on the scene, angles between the different 



planes, or mutual distances between points in space. Any of 
these geometric clues can help either simplify the calculations, 
or make the calculations more strictly Euclidean. 
Another embodiment, shown in Figure 14, generalizes the above 
operation to use with multiple light sources. Using the 
notation above, two calibrated light sources may be used. In 
this case, the matrix A has a rank that is increased to 2N-2. 
To shadow planes from the two light sources SI and S2 can 
intersect at least at two points in the scene. As described 
above, depth information at those two points can then propagate 
along the two edges. They also propagate to the rest of the 
edge web as described above. 

Two shadow planes Iln and Ilm are generated from the two 
light sources as shown in Figure 14. Two corresponding shadow 
edges £^n and ^m intersect on the image plane at the two points P. 
Depth information at Pk and qk propagates through the shadow 
edges for and am. This provides further information which can be 
used for further detection of information in the scene. 



B-Dual-Space geometry 



The notation in the foregoing refers to "Dual Space" and "B- 
shape". The meaning of this notation is further described 
herein. All definitions are given in Euclidean space as well as 
in projective geometry. A mathematical formalism called B-dual- 
space geometry is derived from projective geometry. This 
formalism enables us to explore and compute geometrical 
properties of three-dim.ensional scenes with simple and compact 
notation. This will be illustrated in the following chapter when 
applying that formalism to the problem of camera calibration. 

Let (E) be the 3D Euclidean space. For a given position of 

a camera in space we define T = {0^, X^,Y^, Z^) as the standard frame 
of reference (called "camera reference frame") where is the 
camera center of projection, and the three axes (O^, X^^, (C{,, 1^) 
and {0^,Z^) are mutually orthogonal and right-handed {{0^,X^) and 
{0^,Y^) are chosen parallel to the image plane ). 

We may then refer to a point P in space by its 
corresponding Euclidean coordinate vector X = [X Y Z] in that 
reference frame T . The Euclidean space may also be viewed as a 
three-dimensional projective space . In that representation, 
the point P is alternatively represented by the homogeneous 4- 

vector X:^[X Y Z \f . The sign denotes a vector equality up 
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m 



m 



to a non-zero scalar. Therefore, any scaled version of 



[X y Z l] represents the same point in space. 



A plane 11 in space is defined as the set of points P of 
homogeneous coordinate vector X that satisfy: 



where ^ — I tt^, tt^ tt^ tt J is the homogeneous 4-vector parameterizing 



Observe that if tt is normalized such that tt^ + tt,^ + tt^ = 1, then 

^t;-[^x '^y '^z'] is the normal vector of the plane 11 (in the camera 
reference frame T) and <i^=-7r^its orthogonal (algebraic) 
distance to the camera center 

Image plane and perspective projection 

Let (/) be the 2D image plane. The image reference frame is 
defined as (c,x^,y^) where c is the intersection point between 
{0^,Z^) (optical axis) and the image plane, and {cj x^ ) and (c, y^) are 
the two main image coordinate axes (parallel to [0^, X^) and 

(0^,1^^)). The point c is also called optical center or principal point. 
Let p be the projection on the image plane of a given point 

P of coordinates Y zf^ , and denote x = [a: y]^ 




(2.1) 



the plane 11 ((.) is the standard scalar product operator). 
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its coordinate vector on the image plane. Then, the two vectors 

X and X are related through the perspective projection 
equation : 



X = 



X 


_ 1 


X 


y 


~ z 


Y 



(2.2) 



This projection model is also referred to as a "pinhole" camera 
model . 

In analogy to Euclidean space, it is sometimes useful to 
view the image plane as a two-dimensional projective space 
In that representation, a point p on the image plane has 
homogeneous coordinate vector x~[x y If. Similarly to V\ any 

scaled version of [x y If describes the same point on the image 
plane . 

One advantage of using projective geometry is that the 
projection operator defined in equation 2.2 becomes a linear 
operator from to . 



X:^PX with P = [/3^3 O3.,] 



(2.3) 



where X and x are the homogeneous coordinates of P and p 
respectively, I,^, is the 3x3 identify matrix and O3,, is the 
3x1 zero-vector. Observe from equation 2.3 that x is equal (up 
to a scale) to the Euclidean coordinate vector X =[x Y z] of P : 



x~X 



(2.4) 
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Therefore x is also referred to as the optical ray direction 
associated to P . 

A line A on the image plane is defined as the set of points 
p of homogeneous coordinate vectors x that satisfy: 

(a,x^ = 0 (2.5) 

where A^j^A^, A^ A^J is the homogeneous 3 -vector defining the line 
A. Observe that if A is normalized such that A^+Ay=l, then 

n;^=|^A^ A^J is the normal vector of the line A (in the image 

reference frame) and d^=-X^ its orthogonal (algebraic) distance 
to the principal point c . 

Claim 1: Let and ^te two distinct points on the image plane 
with respective homogeneous coordinate vectors x, and X2 . Then, 
it is straightforward to show that the line A connecting and 
P2 has homogeneous coordinate vector A x, x X2 , where x is the 
standard vector product operator in . 

Claim 2: Let A, and A2 be two distinct lines on the image plane 

with respective homogeneous coordinate vectors Aj and A2 . Then, 
the point of intersection p between the two lines A, and A2 has 

homogeneous coordinate vector x:^A, x A2 . If the two lines are 

parallel then the last coordinate of x is zero. In that case p 
is a point at infinity. 
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There exists useful relations between lines on the image 
plane and planes in space, as illustrated by the two following 
examples . 



Example 1: Consider a line A on the image plane of coordinate 

vector A=|^Aj. A,^ A^j . Then, the set of points P in space that 

project onto A is precisely the plane 11^ spanned by A and the 

camera . Let tt^^ be the coordinate vector of 11;^ . Let us 

compute TT;^ as a function of A. According to equations 2.3 and 

2.5, a point P of homogeneous coordinate vector X will lie on 
11;^ if and only if: 



(a,px) = 



0 



(2.6) 



this relation enforces the projection of P to lie on A . This 
may be alternatively written: 

(P^A,X^ = 0. (2.7) 
Therefore the plane coordinates tt;^ has the following expression 



Tf;, ~ P^A = 



A 
0 



K 

\ 
K 
0 



(2.8) 



Example 2: Consider two planes in space n, and Ilj of respective 
coordinate vectors ff, ~ [tt^^ tt^,^ tt^^ tt^J and Tf.^^ [tt^^ tt,^^ tt^^ tt^ ] . 
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Assume the two planes intersect along line A in space, and call 
A the resulting image line after projection of A onto the image 
plane. Let us compute the coordinate vector A of A as a 
function of vfj and • Consider a point P on A and denote p 
its projection on the image plane. Since P lies on 11, and 112, 

its homogeneous coordinate vector X c::^ r z 1] must satisfy the 
following system: 



Since the homogeneous coordinate vector of p is x~X = [X Y ZY , 
equation 2.10 reduces to a standard image line equation: 



(2.9) 



This system yields: 




= 0. 



(2 .10) 




(2.11) 



where 




A ~ 




(2.12) 
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that is the coordinate vector of X, projector of A = ninn2 . 

Rigid body motion transformation 

Consider a set on N points P- in space (z = 1,...,7V) , and let 

Xt=^Xi Y- be their respective coordinate vectors in the 

camera reference frame T , Suppose the camera moves to a new 
location in space, and let Xl =^Xl Y{ Z[J be the coordinate 
vectors of the same points P- in the new camera reference frame 

T' . Then X- and Xi are related to each other through a rigid 
body motion transformation: 

Vz = (U...,iV), Xl=RX^^T (2.13) 

Where ReS0(3) , which is a special orthogonal 3x3 matrix, and T 
are respectively a 3x3 rotation matrix and a 3 -vector that 
uniquely define the rigid motion between the two camera 
positions. The matrix R is defined by a rotation vector 

n = [l7^ Qy n J such that: 

R==e^^ (2.14) 
where QA is the following skew- symmetric matrix: 



41 




0 -fi, Q,j 
-Qy 0 



(2.15) 



Equation 2.14 may also be written in a compact form using the 
Rodrigues ' formula : 



i2 = /3,3COS(^) + [QA] 



0 



+ 



l-cos(<9) 
0' 



(2.16) 



where ^ = ||f2||, and fl^ is the following semi-positive definite 
matrix : 



(2.17) 



The fundamental rigid body motion equation 2.13 may also be 
written in projective space . In , the point has 

homogeneous coordinate vectors Xi ~ ^X- l]^ and 

Xi' [X- Y- Z'- ij^in the first {T) and second [T') reference frames 
respectively. Then, equation 2.13 may be written: 



Xi ~£)Xi with D = 



R 



0 



1x3 



T 
1 



(2.18) 



where 0,^3 is a 1x3 zero row vector. Observe that the inverse 
relation may also be written as follows: 
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X/:^£)-^X/ with Dr^ = 

Let p'^ be the projection of P- onto the second camera image 

plane, and let x- be the homogeneous coordinate vector. Then, 
following the equation 2.3, we have: 

x; ~ PX/ (2.20) 

which may be also written: 

x; ^ V% (2.21) 
where: which may be also written: 

P' = PD = [i2 r] (2.22) 

The matrix P' is the projection matrix associated to the second 
camera location. 

Consider now a plane 11 of homogeneous coordinate vectors tt 

and 7f' in both camera reference frames T and T* . How do tt and 

7f' relate to each other? Consider a generic point P on IT with 

homogeneous coordinate vectors X and X in both reference 
frames. According to equation 2.1, we have: 



0 



1x3 



1 



(2.19) 
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(7f,X) = 0 



(2 .23) 



which successively implies : 




= 0 



= 0 



(2.25) 



(2.24) 



Therefore : 



tt' ~ D"^7r = 



R 



0. 



TT 



(2.26) 



Similarly, the plane coordinate vector before motion tt may be 
retrieved from Tf through the inverse expression: 



In order to put in practice these concepts, let us go 
through the following example: 

Exeunple 3: In the second reference frame J^' (after camera 

motion) , consider a line A' on the image plane, and the plane 11 
that this line spans with the camera center (similarly to example 

1 . Let A' and tt' be the homogeneous coordinate vectors of A' 



R^ ft 



(2.27) 
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and n in ^' . Let us compute tt , the coordinate vector of 11 in 

the initial camera reference frame T (before motion) as a 

function of A' , R and T . 

According to equation 2.8, tT' and A' are related through the 
following expression: 



TT 



A' 
0 



(2.28) 



Then, TT may be calculated from tt' using equation 2.27: 







A' 










0 







(2.29) 



where P' is the projection matrix associated to the second camera 
location (eq. 2.22). Observe the similarity between equations 
2 . 8 and 2.29. 



Definition of B-dual-space 

As presented in the previous section, a plane 11 in space is 
represented by an homogeneous 4 -vector vr — j^Tr^ tt^ tt^ tt^J in the 

camera reference frame T={0^, X^, Y^, Z^^ (see equation 2.1). 
Alternatively, if E does not contain the camera center 
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(origin of T) then it may be represented by a 3 -vector 
u;= u)y , such that: 




(2.30) 



for any point P e 11 of coordinate vector X = ^ Y zj in T . 
Notice that iD = n^/d^ where is the unitary normal vector of the 
plane and d^^O its distance to the origin. Let (fi) = R3. Since 
every point a; € (fi) corresponds to a unique plane 11 in Euclidean 
space (E) , we refer to (Q) as the 'plane space' or 'B-dual- 
space' . For brevity in notation, we will often refer to this 
space as the dual-space . There exists a simple relationship 

between plane coordinates in projective geometry and dual-space 
geometry: 



U! = 

TT, 



TT. 



TT. 



if TTj ^0 



(2.31) 



In that sense, dual -space geometry is not a new concept in 
computational geometry. Originally, the dual of a given vector 
space (E) is defined as the set of linear forms on (E) (linear 
functions of (E) into the reals R ) . In the case where (E) is 
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the three dimensional Euclidean space, each linear form may be 
interpreted as a plane 11 in space that is typically 

parameterized by a homogeneous 4-vector 7r~[7r^ tt^ tt^ 'n^'J . A point 
P of homogeneous coordinates X = ^X Y Z l]^ lies on a generic 

plane 11 of coordinates tt if and only if ^7f,X^=0 (see [13]). 

Our contribution is mainly the new tJ-parameterization. We will 
show that this representation exhibits useful properties allowing 
us to naturally relate objects in Euclidean space (planes, lines 
and points) to their perspective projections on the image plane 
(lines and points) . One clear limitation of that representation 
is that plane crossing the camera origin cannot be parameterized 
using that formalism (for such planes 7rj = 0). However, this will 
be shown not to be a critical issue in all geometrical problems 
addressed in this thesis (as most planes of interest do not 
contain the camera center) . 

Properties of B-dual space 

This section presents the fundamental properties attached, to 
dual -space geometry. 

The following proposition constitutes the major property 
associated to our choice of parameterization: 

Proposition 1: Consider two planes n„ and ITj in space, with 

respective coordinate vectors a;„ and ^ (q. q) in dual-space, and 
let A = n„nnj, be the line of intersection between them. Let A 
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be the perspective projection of A on the image plane, and A 
its homogeneous coordinate vector. Then A is parallel to ojj-a^. 
In other words, - H is a valid coordinate vector of the line 
A . 

Proof: Let PeA and let p be the projection of P on the 

image plane. Call X = ^X Y zj and x:^-^X the respective 
coordinates of P and p. We successively have: 



Notice that this relation is significantly simpler than that 
derived using standard projective geometry (equation 2.12). 

In addition, observe that the coordinate vector u of any 



^nd in dual-space (Q) . We denote that line by A and call 
it the dualimageot A. The following definition generalizes that 
concept of dual image to other geometrical objects: 




(since Z^Q) 



plane 11 containing the line A lies on the line connecting 
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Definition: Let ^ be a sub-manifold of (E) (e.g., a point, 

line, plane, surface or curve) . The dual image of A of A is 
defined as the set of coordinates vectors cD in dual -space (Q) 

representing the tangent planes to A . Following that standard 
definition (see [13, 14]), the dual images of points, lines and 
planes in (E) may be shown to be respectively planes, lines and 
points in dual-space (Q) . Further properties regarding non- 
linear sub-manifolds may be observed, such as for quadric 
surfaces in [15] or for general apparent contours in space in 
[16] . 

The following five propositions cover the principal 
properties attached to the dual -space formalism. 

Proposition 2 - Parallel Planes - Horizon Line: Let and 

be two parallel planes of coordinates and 4, . Then is 

parallel to u\ . 

Proof: The planes have the same surface normals n^=n^. 

Therefore, the proposition follows from definition of O . 

The horizon line H represents the ''intersection" of two 

planes at infinity. The dual image of H is the line H 
connecting and and crossing the origin of the (Q) space. 
The direction of that line is not only the normal vector of the 
two planes ri^ — rif, , but also the representative vector of the 
projection Xfj of H (horizon line) on the image plane (according 
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to proposition 1) . Although H is not a well-defined line in 
Euclidean space (being a line at infinity) , under perspective 
projection, it may give rise to a perfectly well defined line 
on the image plane {for example a picture of the ocean) . Once 
that line is extracted, the orientation of the plane is known: 



Proposition 3 - Orthogonal Planes: If two planes IT^ and are 

two orthogonal, then so are their coordinate vectors and cJ^, 
in dual-space. Consequently, once one of the plane u)^ is known, 

then Qi, is constrained to lie in the sub-space orthogonal to u;„ , 
a plane in dual -space. 

Proposition 4 - Intersecting lines: Consider two lines A„ 
and A;, intersecting at a point P , and call 11 the plane that 
contains them. In dual -space, the two dual lines A„ and Af, 

necessarily intersect at (D the coordinate vector of E (since (D 
is the plane that contains both lines) . Similarly, the dual 

image P of P is the plane in dual -space that contains both dual 

lines A„ and A^, . Notice that P does not cross the origin of 



(2.32) 
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Proposition 5 - Parallel lines - Vanishing Point: Consider 

two parallel lines and A^, belonging to the plane 11 of 
coordinates u) . Then cD is at the intersection of the two dual 

A A 

lines Aa and A^, . In dual -space, the plane containing both dual 
lines A^ and A^, is the dual image of V of the vanishing point V , 
i.e., the intersection point of A^ and A^, in Euclidean space. 
If H is the horizon line associated with IT, then V e H , which 

A A A 

translates in dual-space into H e V . Since H contains the 
origin, so does V , Notice that once the perspective projection 

A 

of y is observable on the image plane, the plane V is 
entirely known (since its orientation is the coordinate vector of 
V) . 

Proposition 6 - Orthogonal lines: Let Ai and A2 be two 

orthogonal lines contained in the plane 11 of coordinates (D ) and 

let = Ai n A2 . Consider the set of planes orthogonal to IT . 
In the dual-space, that set is represented by a plane containing 
the origin, and orthogonal to O (see proposition 3) . Call that 

A 

plane V (it can be shown to be the dual image of a vanishing 
point) . In that set, consider the two specific planes I!] and 

that contain the lines A, and A2 . In the dual-space, the 
representative vectors (D^ and (Dj of those two planes are defined 
as the respective intersections between V and the two lines A, 

A 

and A2 . Then, since the two lines A, and A2 are orthogonal, the 
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two vectors and CD2 are also orthogonal. This implies that 
the images of the two vanishing points and V2 associated to 
the lines Aj and A2 are orthogonal in the dual -space. Two 
vanishing points are enough to recover the horizon line H 
associated with a given plane IT in space. Therefore, observing 
two sets of parallel lines belonging to the same plane under 
perspective projection allows us to recover the horizon line, and 
therefore the orientation of the plane in space (from proposition 
2) . The horizon line H corresponding to the ground floor H is 
recovered from the two vanishing points V| and V2 . 

2*2.3 Geometrical problems solved in B-dual-space 

This section presents several useful geometrical problems solved 
using dual -space geometry. 

Example 4: Let 11 be a plane in space of coordinate vector lD . 

Let P be a point on 11 with coordinate vector X = [X Y Z]^ . 
Let p be the projection of P onto the image plane, and denote 
X ^ [a: y 1]^ its homogeneous coordinate vector. The 
triangulation problem consists of finding the point P from its 
projection p and the plane 11 , or calculating X from x and Q . 
Since P lies on the optical ray its coordinate vector 

satisfies X — x or equivalently , X = Zx , with x ~ [x y 1]^ . in 



52 



addition, since P lies on the plane 11, we have (u;,X) = 1. This 
implies : 

This is the fundamental triangulation equation between a ray and 
a plane in space. 

Example 5: Consider two camera frames T and J^' and let {R,T} 

be the rigid motion parameters between !F and J^^ . Let IT be a 
plane in space of coordinate vectors u) and a;' in T and 
respectively. How do u and relate to each other? 

Consider a generic point P on 11 of coordinate vectors X 

and X in T and respectively. Then, X = RX + T . Since 
P G n, we may write: 

(cl;',^') = 1 (2.34) 
{uj\RX + T) = \ (2.35) 
{R^uj\X) = 1 - {Co\T) (2.36) 



Therefore : 
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This expression is the equivalent of equation 2.27 in dual-space 
geometry. Notice that the condition {u)\T) ^ 1 xs equivalent to 
enforcing the plane 11 not to contain the origin of the first 
camera reference frame !F . That is a necessary condition in 
order to have a well defined plane vector Q . The inverse 
expression may also be derived in a similar way: 

In that case, the condition {Cj,-R^T) ^ 1 constraints the plane 11 
not to contain the origin of the second reference frame (in 
\A order to have a well defined vector a;') . 

In In some cases, only one of the two plane vectors u or tU' is 

I ! 

fj\ well-defined. The following example is one illustration of such 
a phenomenon. 

T\ \ 

% Excunple 6: Consider the geometrical scenario of example 5 where 

12 

the plane IT is now spanned by a line A' on the image plane of 
the second camera reference frame (after motion) . In that 

case, the coordinate vector cj' is not well defined (since by 
construction, the plane 11 contains the origin of J"' ) . However, 
the plane vector (D may very well be defined since 11 does not 
necessarily contain the origin of the first reference frame !F . 
Indeed, according to equation 2,29, the homogeneous coordinate 
vector 7f of n in is given by: 
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TT 



{T,X') 



(2.40) 



where A' is homogeneous coordinate vector of the image line A' in 

J^' . Then, according to expression 2.31, the corresponding dual- 
space vector O is given by: 

which is perfectly well-defined as long as 11 does not contain 
Oc , or equivalently if (^^A') ^ 0. This condition is equivalent 
to enforcing the line not to contain the epipole on the image 
plane attached to camera frame. The point is the projection of 
onto the image plane attached to the second camera reference 
frame . 

Example 7: triangulation of an optical ray (O^.p) with the plane 
n spanned by the line A' in the other camera reference frame 
. Let X — [X Y Z]^ be the coordinates of P in space and 

X — [x y 1]-^ the coordinates of its known projection p on the 
image plane. Equation 2.41 provides then an expression for the 
coordinate vector a; of IT in frame T : 

where A' is the homogeneous coordinate vector of the image line 
A' in The triangulation expression given by equation 2.40 

returns then the final coordinate vector of P : 
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Observe that the plane n is not allowed to cross the origin 
the initial reference frame T , otherwise, triangulation is 
impossible. Therefore the plane vector a; is perfectly well 
defined (i.e. , (r,A') ^ 0) . 
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3 .LI Camera Calibration in B-dual-space geometry 
The position of a point p in a real image is originally 
expressed in pixel units. One can only say that a point p is at 

the intersection of column = 150 and row Py =50 on a given 

digitized image. So far, we have been denoting x=[x y if the 
homogeneous coordinate vector of a generic point p on the image 
plane. This vector (also called normalized coordinate vector) is 

directly related to the 3D coordinates X = [X Y Zf of the 

corresponding point P is space through the perspective 
projection operator (eq. 2.2). Since in practice we only have 

access to pixel coordinates p—\p^ Py 1]^, we need to establish a 

correspondence between p and x (from pixel coordinates to 
optical ray in space) . 

Since the origin of the image reference frame is at the 

optical center c (or principal point) , it is necessary to know 

the location of that point in the image: c =[c^ c^] (in pixels). 

Let be the focal distance (in meters) of the camera optics 
(distance of the lens focal point to the imaging sensor) , and 

denote by d^ and d^ the x and ?/ dimensions of the pixels in the 

imaging sensor (in meters). Let j^—f^fd^ and fy—fo/dy (in 

pixels) . Notice that for most imaging sensors currently 
manufactured, pixels may be assumed perfectly square, implying 
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d^— dy or equivalently f^=fy. In the general case and fy may 
be different. 

Then, the pixel coordinates p — [Pj: Py if of a point on the 
image may be computed from its normalized homogeneous coordinates 
x=[x y \f through the following expression: 

Px = fx^ 

, (3.1) 

That model assumes that the two axes of the imaging sensor are 
orthogonal. In the case where they are not orthogonal, the pixel 
mapping function may be generalized to: 

Px = fx^-^fyV + c^ 

(3.2) 

Py = fyV + ^'y 

where a is a scalar coefficient that controls the amount of skew 
between the two main sensor axes (if a — 0, there is no skew). 
For now, let us consider the simple model without skew (equation 

3.1). If ^> is the image projection of the point P in space (of 

coordinates X = [X Y Zf), the global projection map may be 
written in pixel units: 

This equation returns the coordinates of a point projected 
onto the image (in pixels) in the case of an ideal pinhole 
camera. Real cameras do not have pinholes, but lenses. 
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Unfortunately a lens will introduce some amount of distortion 
(also called aberration) in the image. That makes the projected 
point to appear at a slightly different position on the image. 
The following expression is a simple first -order model that 
captures the distortions introduced by the lens: 

pinhole projection 
radial distortion (3.4) 
pixel coordinates 

where is called the radial distortion factor. This model is 
also called first -order symmetric radial distortion model 
("symmetric" because the amount of distortion is directly related 
to the distance of the point to the optical center c ) . Observe 

that the systems (3.4) and (3.3) are equivalent when k^=0 (no 
distortion) . 

Therefore, if the position of the point Pis known in 

camera reference frame , one may calculate its projection onto the 
image plane given the intrinsic camera parameters f.^, f^, c^, c^ and 
K- That is known as the direct projection operation and may be denoted 
p = Y{{X). However, most 3D vision applications require to solve 
the "inverse problem" that is mapping pixel coordinates p to 3D 

world coordinates [X Y Zf . In particular, one necessary step 
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Y/Z 




y. 



b = 


h. 


= x(\-i-k^\\a 


p = 








Py 








is to compute normalized image coordinates x — [x y 1]^ (3D ray 
direction) from pixel coordinates p (refer to equation 3.4), 
The only non-trivial aspect of that inverse map computation is i 

computing the vector a from b . This is the distortion 
compensation step. It may be shown that for relatively small 
distortions, this inverse map may be very well approximated by 
the following equation: 

b 



(3.5) 



MM 

Experimentally, this expression is sufficiently accurate. 
Camera calibration 

The camera calibration procedure identifies the intrinsic camera 
parameters fx^fy^^x^^y K (^nd possibly a). A standard method 

is to acquire an image a known 3D object (a checker board 
pattern, a box with known geometry...) and look for the set of 
parameters that best match the computed projection of the 
structure with the observed projection on the image. The 

reference object is also called calibration rig . Since the camera 

parameters are inferred from image measurements, this approach i 
also called visual calibration . This technique was originally 

presented by Tsai and Brown. An algorithm for estimation was 
proposed by Abdel-Aziz and Karara (for an overview on camera 
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calibration, the reader may also refer to the book Klette, 
Schluns and Koschan ) . 

Figures 4A and 4B show two examples of calibration images 
when using a planar rig (checker board pattern) . Figure 5 shows 
a 3D rig (two orthogonal checker board patterns) . 

Note that although the geometry of the calibration rig is 
known (i.e., the mutual position of the grid corners in space), 
its absolute location with respect to the camera is unknown. In 
other words, the pose of the calibration pattern in unknown. 
Therefore, before applying the set of equations (3.4) to compute 
the image projection of every corner in the structure, one needs 
to find their 3D coordinates in the camera reference frame. We 
first choose a reference frame attached to the rig (called the 

object frame) in which we express the known coordinates X\ of 
all the corners (i = . This set of vectors is known since 

the intrinsic rig structure is known. Then, the 

coordinate vector Xc of P- in the camera frame is related xl 
through a rigid motion transformation: 

Vi = l,...,iV, Xc=R,To+T^^ (3.6) 

where and define the pose of the calibration rig with 

respect to the camera (similarly to equation 2.13). See figure 
3.2. 

Notice that by adding the calibration object in the scene, 
more unknowns have been added to the problem: and . Those 
parameters are called extrinsic camera parameters since they are 
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dependent upon the pose of the calibration pattern with respect 
to the camera (unlike the intrinsic parameters that remain 
constant as the rig is moved in front of the camera) . 

Let Qc be the rotation vector associated to the rotation 
matrix (see equation 2.13). Then, the complete set of 
unknowns to solve for is: 



• Radial distortion factor: (1 DOF) , 

• Calibration rig pose: Qc, Tc (6 DOF). 

Therefore, the global calibration problem consists of solving for 
a total of 11 scalar parameters (adding the skew coefficient a 
would bring the number of unknowns to 12) . 

Let p-{i = 1,...,A^) be the observed image projections of the 

rig points P- and let p.^ = [pi p^]^ be their respective pixel 

coordinates. Experimentally, the points p.- are detected using 
the standard Harris corner finders. 

The estimation process finds the set of calibration unknowns 
(extrinsic and intrinsic) that minimizes the reprojection error. 
Therefore, the solution to that problem may be written as 
follows: 



• Focal length: /^,/^ (2 DOF), 



• Principal point coordinates: c^,Cy (2 DOF), 



{•^' fy 



N 



(3.7) 
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where R^=e^'^,U(,) is the image projection operator defined in 

equation 3.4 (function of the intrinsic parameters fx^fy^<^x^^y 

k^) and {|.{{ is the standard distance norm is pixel units. This 
non-linear optimization problem may be solved using standard 
gradient descent techniques. However, it is required to have a 
good initial guess before starting the iterative refinement. a 
method to derive closed form expressions for calibration 
parameters that may be used for initialization is presented. 

Apart from numerical implementation details, it is also 
important to study the observability of the model. In other 
words, under which conditions (type of the calibration rig and 
its position in space) can the full camera model (eq. 3.4) be 
estimated from a single image projection. For example, it is 
worth noticing that if the calibration rig is planar (as shown on 
figure 3.1-a) the optical center c cannot be estimated (the two 
coordinates and ) . Therefore, in such cases, it is 

necessary to reduce the camera model to fewer intrinsic 
parameters and fix the optical center in the center of the image. 
Further discussions on the camera model observability are 
presented. 

Closed- form solution in B-dual-space 
geometry 

This section demonstrates how one may easily retrieve closed-form 
expressions for intrinsic and extrinsic camera parameters using 
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the dual-space formalism as a fundamental mathematical tool. The 
method is based on using vanishing points and vanishing lines. 
The concept of using vanishing points for camera calibration is 
not new (most of the related work on this topic may probably be 
found in the art. Therefore, this does not to state new concepts 
or theories on calibration, but rather illustrate the convenience 
of the dual -space formalism by applying it to the problem of 
calibration. We show here that this formalism enables us to keep 
the algebra simple and compact while exploiting all the 
geometrical constraints present in the scene (in the calibration 
rig) . That will also lead us to derive properties regarding the 
observability of several camera models under different 
geometrical configuration of the setup, and types of calibration 
rig used (2D or 3D) . Most related work on that topic only deal 
with simple camera model (unique focal length) and extract the 
extrinsic parameters through complex 3D parameterization (using 
Euler angles) . Other standard methods for deriving explicit 
solutions for camera calibration were presented by Abdel-Aziz and 
Karara and Tsai. These methods are based on estimating, in a 
semi-linear way, a set of parameters that is larger than the real 
original set of unknowns and do not explicitly make use of all 
geometrical properties of the calibration rig. 

The method that we disclose here involves very compact 
algebra, uses intuitive and minimal parameterizations, and 
naturally allows to exploit all geometrical properties present in 
the observed three-dimensional scene (calibration rig) . In 
addition, our approach may be directly applied to natural images 
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that do not contain a special calibration grid (such as pictures 
of buildings, walls, furniture...). 

Once it is computed, the closed- form solution is then fed to 
the non-linear iterative optimizer as an initial guess for the 
calibration parameters. This final optimization algorithm is 
inspired from the method originally presented by Tsai including 
lens distortion (see equation 3.7). The purpose of that analysis 
is to provide a good initial guess to the non-linear optimizer, 
to better insure convergence, and check for the consistency of 
the results. 

We will first consider the case of a calibration when using 
a planar rig (a 2D grid) , and then generalize the results to 3D 
rigs (such as a cube) . In those two cases, different camera 
models will be used. 

3o2ol. When using a planar calibration rig 

Consider the calibration image shown in figure 4A. Assuming no 
lens distortion (fc^=0) and no image noise, the grid may be 
summarized by its four extreme corners on the image (intuitively, 
one may localize all the inside grid corners from those four 
points by simple perspective warping) . In practice, all points 
will be used in order to be less sensitive to image noise, 
however the principle remains the same. Then, the basic observed 
pattern is a perspective view of a rectangle of known dimensions 
LxW . Without loss of generality, we can also assume that this 
rectangle is a square. The reason for that is that through a 
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similar perspective image warping, it is always possible to 
convert a perspective view of a rectangle into a perspective view 
of a square, given that the dimensions of the original rectangle 
are known (actually, only the ratio WfL is necessary) . 

Figure 15 shows a perspective image of a square ABCD . The 
four points x^.x^.x^ and x^ are the coordinate vectors of the 
detected corners of the square on the image plane 

after normalization . This means that the x vectors are computed from 
the pixel coordinates of the points after subtraction of the 
optical center coordinates (c^;C^) (in pixel) and scaling by the 

inverse of the focal length (in pixel as well) . To model the 
aspect ration in x and y , one can assume two distinct focal 

lengths and in both image directions (to account for non- 
square CCD pixels) . 

In the case of calibration from planar rigs, it is known 
that the optical center position {c^,Cy) cannot be estimated. 

Therefore, we will keep it fixed at the center of the image, and 
take it out of the set of unknowns. The resulting intrinsic 

parameters to be estimated are therefore and . Let 

Pi — [Px Py (i = h -A) be the pixel locations of the corners 

after subtraction of the optical center (in homogeneous 
coordinates) . Then one can extract the x vectors through a 
linear operation involving the focal lengths and : for 

2 = 1,. ..,4, 
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V/. 0 0 



(3.8) 



0 0 1 



where K is the intrinsic camera matrix containing the intrinsic 
parameters (f^andfy). Let us now extract the set of independent 

constraints attached to the observation in order to estimate the 
focal lengths (hence the camera matrix K) . 

Figure 15 shows the set of corner points x. on the image 
plane. Following proposition 5 of section 2.2.2, lines 
A- (z = l,...,4) are used to infer the two vanishing points and V2 
in order to recover the projection of the horizon line H 
associated to the plane 11^. The derivation is as follows: 



where x is the standard vector product in M . Notice that in 
order to keep the notation clear, we abusively used and V2 to 
refer to the homogeneous coordinates of the vanishing points on 
the image plane (quantities similar to the x . ' s using homogeneous 
coordinates) . It is important to keep in mind that all 
equalities are defined "up to scale." For example, any vector 

proportional to x Xj would be a good representative for the same 



A I — 3/ 1 X x^ 

A 2 — ■^/^ X 




(3.9) 



A -J — x^ X X' 



— ^4 X X| 
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line Aj . The same observation holds for the coordinate vectors 
of the vanishing points and that of the horizon line. 

Yet, the normalized coordinates x- of the corners are not 
directly available, only the pixel coordinates p- . However, all 

x.'s can be retrieved from the p.'s through the linear equation 

3.8. We will use of the following statement whose proof may be 
found in [30] : 

Claim 1: Let K be any 3x3 matrix, and u and v any two 3- 
vectors. Then the following relation holds: 

(Ku)x(Kv) = K\uxv) 

where K* is the adjoint of K (or the matrix of cof actors of 
K). Note that if K is invertible (which is the case here), 
then K* = dQX(K){K^)-^ , and consequently K** cxK . 

Using that claim, the camera matrix K (or K* ) may be 
factored out of the successive vector products of equations 3.9, 
yielding : 

















2^ K*X^ 



where A/\ A^, Aj', A/, and A^' are line and point coordinate 

vectors on the image plane in pixel (directly computed from the 
pixel coordinates PpPj'^a Pa^ ' 
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^4 - Pa ^ Pi 



The step of inferring the vanishing points Vj and V2 from the 
pairs of lines {ApAj} and {Aj,^} made use of the fact that ABCD 

is a parallelogram (proposition 5) . Using proposition 6 (in 
section 2.2.2), one naturally enforce orthogonality of the 

pattern by stating that the two vanishing points V^ and V2 are 

mutually orthogonal (see figure 2.11) : 



V^±V2 <^ iKV^'')±iKVl) 



(3.10) 



That provides one scalar constraint in the focal lengths and 



(3.11) 



f f 

where a^^ a^^ b^, b^^ and are the known pixel coordinates of the 

vanishing points 7/' and : V,^ ^ [a, 6, c,f and [a^ b^ c^f . 

Notice that equation 3.11 constraints the two square focals 
(fx^fy) to lie on a fixed hyperbola. Finally, the parallelogram 

ABCD is not only a rectangle, but also a square. This means 
that its diagonals (AC) and (BD) are also orthogonal (see figure 
15) . This constraint is exploited by enforcing the two vanishing 
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points attached to the diagonal and to be mutually 
orthogonal (proposition 6) . Those points are extracted from 
intersecting the two diagonal lines and with the horizon 

line A^ (see figure 15) . Following the same process of 

factoring the K matrix (or K* ) out of every successive vector 
product, one obtains: 

where V^^ and are the two pixel coordinates of the vanishing 

points F3 and (pre-computed from the pixel coordinates of the 
corner points) : 

Then, the orthogonality of and yields (y^f{K'^K)(y^) = 0, 
or : 

-7^ + 72^ + ^3^4 = 0 (3.12) 

where a^, a^.h^^b^, c-^ and are the known pixel coordinates of V^' 

and : 1^3^' ~ [03 63 c^f and 7/ ~ [a^ c^f . This constitutes a 

second constraint on and (a second hyperbola in the Ux->fy) 
plane), which can be written together with equation 3.11 in a 
form of a linear equation in u—[u^ u^'f =^\\lf^ ^1 fy^ * 
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Au = b with A = 



and b — — 



If ^ is invert ible, then both focals and may be recovered 
explicitly: 



u ~ 



or : 



/ = 





- 03046,62 


6,620304 


- 63640,02 


10,026364 


- 03046,62 



under the condition > 0 and ^ ^ . 

If A is not invertible (or a^a2bTb^ — a^a^bfi2 = ^ ) , then both 
focals (f^, fy) cannot be recovered. However, if A is of rank one 
(i.e. it is not the zero matrix), then a single focal length 
model = = may be used. The following claim gives a 

necessary and sufficient condition for A to be rank one: 
Claim 2: The matrix A is rank one if and only if the 

projection of the horizon line is parallel to either the x or 
y axis on the image plane (its first or second coordinate is 
zero, not both) , or crosses the origin on the image plane (its 
last coordinate is zero) . Since the matrix K is diagonal, this 
condition also applies to the horizon line in pixel coordinates 
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Corollary: Since is proportional to the surface normal 

vector fif^ (from proposition 2 in section 2.2.2), this degeneracy 

condition only depends upon the 3D orientation of the plane H^^ 

with respect to the camera, and not the way the calibration grid 
is positioned onto it (this is intrinsic to the geometry of the 
setup) . 

In such a rank-one degenerate case, the reduced focal model 
is acceptable. Then both constraint equations 3.11 and 3.12 may 

be written as a function of a unique focal / and follows: 



^3^4 + i>3&4 



which may be solved in a least squares fashion, yielding the 
following solution: 



V cfcl+cy, ^ ^ ^ 

Alternative estimates may be derived by directly solving for 
either one of the constraint equations (3.11 or 3.12) taking 

fx—fy~fc' This may be more appropriate in the case where one of 
the four vanishing points is at infinity (corresponding to 

= 0 ) . It is then better to drop the associate constraint and 
only consider the remaining one (remark: having a vanishing point 
at infinity does not necessarily mean that the matrix A is 
singular) . Since the vector X^j is parallel to the normal vector 
of the ground plane , this rank-one degeneracy case 
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corresponds to having one of the camera axis X^, or parallel 
to the calibration plane . 

Note that if two vanishing points are at infinity, then the 
projection of the entire horizon line, is also at infinity on 
the image plane (its two first coordinates are zero) . This 
occurs only when the calibration plane 11^^ is strictly parallel 

to the image plane (or n^=[0 0 l]^ ) , which is known to be a 
degenerate case where there exists no solution for calibration. 

In the case where the planar pattern is a rectangle, but not 
necessarily a square (or equivalently , the aspect ratio of the 
rectangle is not known) , then the diagonal constraint is not 
available (equation 3.12). In that case, only equation 3.11 is 
available to estimate focal length. It is therefore necessary to 
use a reduced single focal model f^ = f^=f^: 



This expression will be used in a calibration experiment 
illustrated in figure 3.5. 

Once the camera matrix K is estimated, the normalized 
horizon vector — K'^Xj^ may be recovered. From proposition 2, 
this vector is known to be proportional to the coordinate vector 
ujf, of 11/, (or its normal vector n/ J . Therefore, this directly 
provides the orientation in 3D space of the ground plane. The 
only quantity left to estimate is then its absolute distance df. 




(3.14) 
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to the camera center, or equivalently the norm ||cJ/J| = . This 
step may be achieved by making use of the known area of the 
square ABCD and applying an inverse perspective projection on 
it (possible since the orientation of 11/^ is known) . 

Implementation details: In principle, only the four extreme 
corners of the rectangular pattern are necessary to localize the 

four vanishing points V^^, V^, V/ and V/ . However, in order to be 
less sensitive to pixel noise, it is better in practice to make 
use of all the detected corners on the grid (points extracted 
using the Harris corner finder) . This aspect is especially 
important given that vanishing point extraction is known to be 
very sensitive to noise in the corner point coordinates 
(depending on amount of depth perspective in the image) . One 
possible approach is to fit a set of horizontal and vertical 
lines to the pattern points, and then recover the two vanishing 

points by intersecting them in a least squares fashion. 

Once these two points are extracted, the position of the extreme 
corners of the rectangular pattern may be corrected by enforcing 
the four extreme edges of the grid to go through those vanishing 
points. The next step consists of warping the perspective view 
of the rectangle into a perspective view of a square (making use 
of the known aspect ration of the original rectangle) . The two 

remaining vanishing points and V/ may then be localized by 
intersecting the two diagonals of this square with the horizon 

line A^' connecting V,^ and V^' . Once those four points are 
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extracted, the focal length may be estimated, together with the 
plane coordinate vector o;^ following the method described 
earlier (using a one or two focal model) . 

When using a 3D calibration rig 

Let us generalize the results to the case where a 3D rig of the 
type in Figure 5 is used for calibration. Figure 16 shows a 
perspective view of a cube in 3D. From that image, one may 
extract seven vanishing points V^,V2, ... V7 . Similarly to the case 
of a planar square, this set of points must satisfy five 
orthogonality properties : ± V2, ± V^, -L V^, ± F5 and ±Vt , 
Then, similarly to equation 3.10, we can write a set of five 
scalar constraints on the pixel coordinates of the vanishing 
points : 

' V, ±V2 ^ (V,^f (K^K) (K/) = 0 

V,±V, ^ (V,'f (K^K) = 0 
-V2±V3 => (V/f (iT^iT) = 0 (3.15) 

V,±Vs {Vif (K'^K)(V,n = 0 

Where K is the intrinsic camera matrix and 

—[^i bi c-i]^ (z = I,...,?) are the pixel coordinate vectors of the 
vanishing points (see figure 16) . Note that the first three 
constraints in (3.15) enforce mutual orthogonality of the faces 
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of the cube, whereas the last two force the left and fight faces 
(and therefore all the others) to be squares. 

Given those five independent constraints, one should be able 
to estimate a full 5 degrees of freedom (DOF) camera model for 

metric calibration including two focal lengths ( and fy in 

pixels), the optical center coordinates {c^ and Cy in pixels) and 
the skew factor a (see equation 3.2). In that case, the 
intrinsic camera matrix K takes its most general form [31] : 



0 0 1 



This model matches precisely the notation introduced in equation 

(3.2). Then, the semi-positive definite matrix K may be 
written as follows: 



1 

a 



a. 



(3.16) 



1 ^5 — ^3 

—Ut, — Z/4 M| 



(3.17) 



Notice that the vanishing point constraints (3.15) are 
homogeneous. Therefore, one can substitute K by its 
proportional matrix K . Doing so, the five constraints 

listed in equation 3.15 are linear in the vector it = [u, ... ^5]^ . 
Indeed, for G { (1, 2) , (1,3), (2,3), (4,5), (6,7)}, we have : 
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in 

i3 



ly 



n 



[-CiC, -bibj {uiCj+ajCi) (biCj+bjC) -(aibj+ajbi)]u = aiaj, 

(3.18) 

Therefore, this set of 5 equations may be written in a form of a 
linear system of 5 equations in variable u : 

Au = b 

where ^ is a 5 x 5 matrix, and b a 5-vector: 



(3.19) 







-C,C2 


6162 (OiC2+a2Ci) 


(6|C2 +62C1) 


-(0,62 +026,) 






a^a2 




A= 


-C,C3 


6163 (a, 03+030,) 


(61C3 +63C1) 


-(0,63 +036,) 










-C2C3 


62^3 (02C3 + 0302) 


(62C3 + 63C2) 


-(0263 + 0362) 




b = 








— C4C5 


6465 (0405 + 0504) 


(64C5 +65C4) 


-(0465 + 0564) 












-C6C7 


h^7 (^6^7 + 07^6) 




-(06^7 +^765) 








ru 




If the 


matrix A is 


invert ible. 


this system 


admits a 



solution u = A 'b . Finally, the intrinsic camera parameters are 
retrieved from u as follows: 



/. = 
c. = 



V «2-«2 

fxl\lu2 - 



(3 .20) 



This final step consisting of de-embedding the intrinsic 
parameters from the vector u is equivalent to a Choleski 
decomposition of the matrix K in order to retrieve K . 

If A is not invertible, then the camera model needs to be 
reduced. A similar situation occurs when only one face of the 
rectangular parallelepiped is known to be square. In that case. 
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one of the last two constraints of (3.15) has to be dropped, 
leaving only 4 equations. A first model reduction consists of 
setting the skew factor a = 0, and keeping as unknowns the two 
focals (fx^fy) and camera center (c^.Cy) . That approximation is 
very reasonable for most cameras currently available. The 
resulting camera matrix K has the form: 



K = 



0 0 1 



(3.21) 



leading to the following K K matrix: 



1 0 

0 Ujfyf -Cyifjfyf 



1 0 -W3 

0 U2 —U4 



Then, each constraint (K/)^ (K^K) {Vf) = 0 may be written in the 
form of a linear equation in n = [u, ... ^4]^ : 



[ -c,Cj - b,bj ( a^Cj + ajCi ) (b,Cj + b^Ci) 



(3 .22) 



resulting in a 4 x 4 linear system Au = b , admitting the 
solution w if ^ is rank 4. The intrinsic camera parameters 
ifx^fy^^x^^y) then be computed from the vector u following the 

set of equations (3.20) setting a = = 0 . When A has rank less 
than 4, the camera model needs to be further reduced (that is the 
case when only 3 orthogonality constraints are available) . A 

second reduction consists of using a single focal /. = = , 
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leading to a 3 DOF model. In that case, the K matrix takes on 
the following reduced expression: 



K = 



1 0 
0 1 



l//c 0 -cjf, 

0 l//c -Cy/fo 

0 0 1 



— c„ 



_1_ 



1 0 -U2 

0 1 -uj 

—■112 — '"3 ^^1 



Then, each constraint listed in (3.15) can be written in the form 

of a linear equation of the variable u = [m, U2 u-^]^ : 

(Vi"/ (K^'K) (Yf) = 0 <^ [ -cfij ( UiCj + ajd ) (6,0, + bjCi) ] u = (a,a, + bfij) 

Once again, this leads to the linear problem Au = b where A is 

a 5 X 3 matrix, and b a 5 -vector (if all five constraints are 
valid). A least squares solution is in general possible: 

u = (A^Ay^A^b . Notice that if the faces of the rig are known 
mutually orthogonal but not necessarily square, then only the 
three first constraints of (3.15) are enforceable. In that case, 

the linear system is 3 x 3, and its solution is u = A~^b . Once 
the vector u is recovered, the intrinsic camera parameters have 
the following expression: 

fc=fx=fv = V^i - '^l - «3 



= U2 



Cy = U3 



(3.23) 
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When A has rank less than 3, the model needs to be further 
reduced to 2 or 1 DOF by taking the optical center out of the 
model (fixing it to the center of the image) and then going back 
to the original model adopted in the planar rig case (one or two 
focal models) . 

This system can enable decoupling intrinsic from extrinsic 
parameters and derive a set of closed- form solutions for 
intrinsic camera calibration in case of five model orders: 1, 2, 
3, 4 and 5 degrees of freedom. In addition, we stated conditions 
of observability of those models under different experimental 

,^ situations corresponding to planar and three-dimensional rigs. 

The following table summarizes the results by giving, for each 

^?=f model order, the list of parameters we have retrieved explicit 

in expressions for, as well as the minimum structural rig necessary 

Cn to estimate the model: 

r 

f = \ 

CP 

E J 

Q 
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4 • 



Model order 


Parameters 


Calibration rig (minimum required 
structure) 


1 DOF 


f = L=fy 


2D rectangle 


2 DOF 


fx^ fy 


2D square 


3 DOF 


f fx fyy ^xy^y 


3D rectangular parallelepiped 


4 DOF 


fx^fy^^x^^y 


3D rectangular parallelepiped with 
one square face 


5 DOF 


fxy fy^^xy^yy^ 


3D cube 



One could use this explicit set of solutions for calibrating 
image sequences where intrinsic camera parameters are time 
varying. A typical sequence could be a flyby movie over a city 
with buildings. 

In a broader sense, this work provides a general framework 
for approaching problems involving reconstruction of three- 
dimensional scenes with known structural constraints (for example 
orthogonality of building walls in a picture of a city) . Indeed, 
constraints, such as parallelism or orthogonality, find very 
compact and exploitable representations in dual-space. This 
approach may avoid traditional iterative minimization techniques 
(computationally expensive) in order to exploit constraints in a 
scene . 
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Although only a few embodiments have been disclosed in 
detail above, other embodiments are possible. 
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